Picture for Yong Man Ro

Yong Man Ro

Decoding Strategies for Diffusion-Based ASR: A Systematic Evaluation of Confidence-Based Thresholding

Add code
May 28, 2026
Viaarxiv icon

Diffusion Large Language Models for Visual Speech Recognition

Add code
May 27, 2026
Viaarxiv icon

Robust Grounding with MLLMs against Occlusion and Small Objects via Language-guided Semantic Cues

Add code
Apr 27, 2026
Viaarxiv icon

STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding

Add code
Mar 29, 2026
Viaarxiv icon

Recursive Think-Answer Process for LLMs and VLMs

Add code
Mar 03, 2026
Viaarxiv icon

MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models

Add code
Jan 29, 2026
Viaarxiv icon

Robust Egocentric Visual Attention Prediction Through Language-guided Scene Context-aware Learning

Add code
Jan 05, 2026
Viaarxiv icon

GCAgent: Long-Video Understanding via Schematic and Narrative Episodic Memory

Add code
Nov 15, 2025
Viaarxiv icon

Unified Reinforcement and Imitation Learning for Vision-Language Models

Add code
Oct 22, 2025
Figure 1 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Figure 2 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Figure 3 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Figure 4 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Viaarxiv icon

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

Add code
Jun 18, 2025
Viaarxiv icon